Search CORE

48 research outputs found

Hopfield Networks in Relevance and Redundancy Feature Selection Applied to Classification of Biomedical High-Resolution Micro-CT Images

Author: Auffarth B.
Cerquides J.
Lopez M.
Publication venue: Springer Heidelberg
Publication date: 17/07/2008
Field of study

We study filter–based feature selection methods for classification of biomedical images. For feature selection, we use two filters — a relevance filter which measures usefulness of individual features for target prediction, and a redundancy filter, which measures similarity between features. As selection method that combines relevance and redundancy we try out a Hopfield network. We experimentally compare selection methods, running unitary redundancy and relevance filters, against a greedy algorithm with redundancy thresholds [9], the min-redundancy max-relevance integration [8,23,36], and our Hopfield network selection. We conclude that on the whole, Hopfield selection was one of the most successful methods, outperforming min-redundancy max-relevance when\ud more features are selected

CiteSeerX

CogPrints Cognitive Sciences Eprint Archive

Accurate prediction of protein–protein interactions from sequence alignments using a Bayesian method

Author: Cerquides J
Erik van Nimwegen
Lukas Burger
Publication venue: Nature Publishing Group
Publication date: 01/01/2008
Field of study

Accurate and large-scale prediction of protein–protein interactions directly from amino-acid sequences is one of the great challenges in computational biology. Here we present a new Bayesian network method that predicts interaction partners using only multiple alignments of amino-acid sequences of interacting protein domains, without tunable parameters, and without the need for any training examples. We first apply the method to bacterial two-component systems and comprehensively reconstruct two-component signaling networks across all sequenced bacteria. Comparisons of our predictions with known interactions show that our method infers interaction partners genome-wide with high accuracy. To demonstrate the general applicability of our method we show that it also accurately predicts interaction partners in a recent dataset of polyketide synthases. Analysis of the predicted genome-wide two-component signaling networks shows that cognates (interacting kinase/regulator pairs, which lie adjacent on the genome) and orphans (which lie isolated) form two relatively independent components of the signaling network in each genome. In addition, while most genes are predicted to have only a small number of interaction partners, we find that 10% of orphans form a separate class of ‘hub' nodes that distribute and integrate signals to and from up to tens of different interaction partners

Crossref

edoc

PubMed Central

Alleviating Naive Bayes attribute independence assumption by attribute weighting

Author: Carman Mark J.
Cerquides Jesús
Webb Geoffrey I.
Zaidi Nayyar A.
Publication venue
Publication date: 24/05/2016
Field of study

Despite the simplicity of the Naive Bayes classifier, it has continued to perform well against more sophisticated newcomers and has remained, therefore, of great interest to the machine learning community. Of numerous approaches to refining the naive Bayes classifier, attribute weighting has received less attention than it warrants. Most approaches, perhaps influenced by attribute weighting in other machine learning algorithms, use weighting to place more emphasis on highly predictive attributes than those that are less predictive. In this paper, we argue that for naive Bayes attribute weighting should instead be used to alleviate the conditional independence assumption. Based on this premise, we propose a weighted naive Bayes algorithm, called WANBIA, that selects weights to minimize either the negative conditional log likelihood or the mean squared error objective functions. We perform extensive evaluations and find that WANBIA is a competitive alternative to state of the art classifiers like Random Forest, Logistic Regression and A1DE. © 2013 Nayyar A. Zaidi, Jesus Cerquides, Mark J. Carman and Geoffrey I. Webb.This research has been supported by the Australian Research Council under grant DP110101427 and Asian Office of Aerospace Research and Development, Air Force Office of Scientific Research under contract FA23861214030. The authors would like to thank Mark Hall for providing the code for CFS and MH. The authors would also like to thank anonymous reviewers for their insightful comments that helped improving the paper tremendously.Peer Reviewe

Digital.CSIC

Mixed Multi-unit Combinatorial Auctions for Supply Chain Management

Author: Cerquides J.
Endriss U.
Giovannucci A.
Rodríguez-Aguilar J.A.
Vinyals M.
Publication venue
Publication date: 01/01/2007
Field of study

International Migration, Integration and Social Cohesion online publications

UvA-DARE

De-Novo Discovery of Differentially Abundant Transcription Factor Binding Sites Including Their Positional Preference

Author: AD Smith
AM Benotmane
C Linhart
CE Lawrence
CT Harbison
DJ Galas
DJ Lockhart
DJC MacKay
DS Johnson
E Redhead
E Wingender
G Mönke
G Pavesi
GA Wray
GK Sandve
H Wettig
Harmen J. Bussemaker
HM Wallach
IA Paponov
Ivan A. Paponov
Ivo Grosse
J Cerquides
J Davis
J Wu
Jan Grau
JC Bryne
JD Hughes
Jens Keilwagen
LM Hellman
LV Sun
M Tompa
Marc Strickert
NK Kim
O Elemento
S Sonnenburg
S Sonnenburg
Stefan Posch
T Ulmasov
T Ulmasov
TD Schneider
TJ Guilfoyle
TL Bailey
V Matys
VV Raghavan
W Ao
W Thompson
WA Thompson
WD Teale
Publication venue: Public Library of Science
Publication date: 10/02/2011
Field of study

Transcription factors are a main component of gene regulation as they activate or repress gene expression by binding to specific binding sites in promoters. The de-novo discovery of transcription factor binding sites in target regions obtained by wet-lab experiments is a challenging problem in computational biology, which has not been fully solved yet. Here, we present a de-novo motif discovery tool called Dispom for finding differentially abundant transcription factor binding sites that models existing positional preferences of binding sites and adjusts the length of the motif in the learning process. Evaluating Dispom, we find that its prediction performance is superior to existing tools for de-novo motif discovery for 18 benchmark data sets with planted binding sites, and for a metazoan compendium based on experimental data from micro-array, ChIP-chip, ChIP-DSL, and DamID as well as Gene Ontology data. Finally, we apply Dispom to find binding sites differentially abundant in promoters of auxin-responsive genes extracted from Arabidopsis thaliana microarray data, and we find a motif that can be interpreted as a refined auxin responsive element predominately positioned in the 250-bp region upstream of the transcription start site. Using an independent data set of auxin-responsive genes, we find in genome-wide predictions that the refined motif is more specific for auxin-responsive genes than the canonical auxin-responsive element. In general, Dispom can be used to find differentially abundant motifs in sequences of any origin. However, the positional distribution learned by Dispom is especially beneficial if all sequences are aligned to some anchor point like the transcription start site in case of promoter sequences. We demonstrate that the combination of searching for differentially abundant motifs and inferring a position distribution from the data is beneficial for de-novo motif discovery. Hence, we make the tool freely available as a component of the open-source Java framework Jstacs and as a stand-alone application at http://www.jstacs.de/index.php/Dispom

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Unifying generative and discriminative learning principles

Author: A Bernal
A Culotta
A Feelders
A Mccallum
AE Kel
AY Ng
BP Lewis
C Burge
CM Bishop
D Cai
D Grossman
E Redhead
E Segal
E Wingender
F Pernkopf
G Bouchard
G Bouchard
G Stormo
G Yeo
H Wallach
H Wettig
HE Peckham
I Ben-Gal
Ivo Grosse
J Aldrich
J Cerquides
J Grau
J Keilwagen
J Keilwagen
JA Lasserre
Jan Grau
Jens Keilwagen
JH Xue
M Maragkakis
M Tompa
M Zhang
Marc Strickert
O Yakhnenko
P Grünwald
R Greiner
R Raina
R Staden
RA Fisher
S Sonnenburg
SL Salzberg
Stefan Posch
T Abeel
T Hastie
TH Kim
Y Barash
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background The recognition of functional binding sites in genomic DNA remains one of the fundamental challenges of genome research. During the last decades, a plethora of different and well-adapted models has been developed, but only little attention has been payed to the development of different and similarly well-adapted learning principles. Only recently it was noticed that discriminative learning principles can be superior over generative ones in diverse bioinformatics applications, too. Results Here, we propose a generalization of generative and discriminative learning principles containing the maximum likelihood, maximum a posteriori, maximum conditional likelihood, maximum supervised posterior, generative-discriminative trade-off, and penalized generative-discriminative trade-off learning principles as special cases, and we illustrate its efficacy for the recognition of vertebrate transcription factor binding sites. Conclusions We find that the proposed learning principle helps to improve the recognition of transcription factor binding sites, enabling better computational approaches for extracting as much information as possible from valuable wet-lab data. We make all implementations available in the open-source library Jstacs so that this learning principle can be easily applied to other classification problems in the field of genome and epigenome analysis.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Apples and oranges: avoiding different priors in Bayesian DNA sequence analysis

Author: A Bernal
A Culotta
A Feelders
AE Kel
AL Berger
AY Ng
C Burge
CM Bishop
D Cai
D Grossman
D Heckerman
D Klein
E Redhead
E Segal
F Pernkopf
G Yeo
GD Stormo
H Wallach
H Wettig
HE Peckham
I Ben-Gal
Ivo Grosse
J Cerquides
J Davis
J Goodman
J Grau
J Keilwagen
Jan Grau
Jens Keilwagen
L Narlikar
M Arita
M Meila-Predoviciu
M Tompa
M Zhang
MI Jordan
NK Kim
O Schulte
O Yakhnenko
P Grünwald
R Castelo
R Castelo
R Greiner
R Staden
S Chen
S Sonnenburg
SL Salzberg
Stefan Posch
T Fawcett
TH Kim
TM Chen
WL Buntine
Y Barash
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background One of the challenges of bioinformatics remains the recognition of short signal sequences in genomic DNA such as donor or acceptor splice sites, splicing enhancers or silencers, translation initiation sites, transcription start sites, transcription factor binding sites, nucleosome binding sites, miRNA binding sites, or insulator binding sites. During the last decade, a wealth of algorithms for the recognition of such DNA sequences has been developed and compared with the goal of improving their performance and to deepen our understanding of the underlying cellular processes. Most of these algorithms are based on statistical models belonging to the family of Markov random fields such as position weight matrix models, weight array matrix models, Markov models of higher order, or moral Bayesian networks. While in many comparative studies different learning principles or different statistical models have been compared, the influence of choosing different prior distributions for the model parameters when using different learning principles has been overlooked, and possibly lead to questionable conclusions. Results With the goal of allowing direct comparisons of different learning principles for models from the family of Markov random fields based on the <it>same a-priori information</it>, we derive a generalization of the commonly-used product-Dirichlet prior. We find that the derived prior behaves like a Gaussian prior close to the maximum and like a Laplace prior in the far tails. In two case studies, we illustrate the utility of the derived prior for a direct comparison of different learning principles with different models for the recognition of binding sites of the transcription factor Sp1 and human donor splice sites. Conclusions We find that comparisons of different learning principles using the same a-priori information can lead to conclusions different from those of previous studies in which the effect resulting from different priors has been neglected. We implement the derived prior is implemented in the open-source library Jstacs to enable an easy application to comparative studies of different learning principles in the field of sequence analysis.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Disentangling Direct from Indirect Co-Evolution of Residues in Protein Alignments

Author: A Bateman
A Fodor
A Fodor
B Bollobás
B Rost
C Chow
C Miller
C Yanovsky
CH Yeang
D Chiu
D Pollock
E Tillier
Erik van Nimwegen
F Pazos
G Gloor
G Shackelford
G Süel
J Cerquides
J Cheng
J Izarzugaza
K Wollenberg
L Burger
L Martin
Lukas Burger
M Fares
M Meilà
M Weigt
N Halabi
O Olmean
Philip E. Bourne
R Finn
R Gouveia-Oliveira
S Dunn
S Eddy
S Eddy
S Hunter
S Lindgreen
S Lockless
S Maisnier-Patin
T Kortemme
TM Cover
W Fitch
Publication venue: Public Library of Science
Publication date: 01/01/2010
Field of study

Predicting protein structure from primary sequence is one of the ultimate challenges in computational biology. Given the large amount of available sequence data, the analysis of co-evolution, i.e., statistical dependency, between columns in multiple alignments of protein domain sequences remains one of the most promising avenues for predicting residues that are contacting in the structure. A key impediment to this approach is that strong statistical dependencies are also observed for many residue pairs that are distal in the structure. Using a comprehensive analysis of protein domains with available three-dimensional structures we show that co-evolving contacts very commonly form chains that percolate through the protein structure, inducing indirect statistical dependencies between many distal pairs of residues. We characterize the distributions of length and spatial distance traveled by these co-evolving contact chains and show that they explain a large fraction of observed statistical dependencies between structurally distal pairs. We adapt a recently developed Bayesian network model into a rigorous procedure for disentangling direct from indirect statistical dependencies, and we demonstrate that this method not only successfully accomplishes this task, but also allows contacts with weak statistical dependency to be detected. To illustrate how additional information can be incorporated into our method, we incorporate a phylogenetic correction, and we develop an informative prior that takes into account that the probability for a pair of residues to contact depends strongly on their primary-sequence distance and the amount of conservation that the corresponding columns in the multiple alignment exhibit. We show that our model including these extensions dramatically improves the accuracy of contact prediction from multiple sequence alignments

Public Library of Science (PLOS)

Crossref

edoc

Directory of Open Access Journals

PubMed Central

Detecting and correcting the binding-affinity bias in ChIP-seq data using inter-species information

Author: A Mathelier
A Valouev
AL Gomes
B Alipanahi
C Notredame
CE Lawrence
D Jain
D Karolchik
D Park
D Villar
DA Nix
E Sober
EG Wilbanks
Hendrik Treutler
Ivo Grosse
J Felsenstein
J Hawkins
Jesus Cerquides
JH Elliott
L Teytelman
M Nettling
M Nowrousian
Martin Nettling
MB Rye
MG Ross
P Arnold
PJ Park
R Jothi
SG Landt
T Håndstad
TD Schneider
TL Bailey
TL Bailey
TS Furey
Y Zhang
Y Zhang
YL Jung
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref